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NATURAL LANGUAGE INPUT METHOD AND APPARATUS 

-The present invention generally ^relates^to a natural 
language input method and apparatus to allow~£or computer 
usable data^to -be input by recognising a natural language 
input which includes pauses . 

When inputting data in a natural language r a user 
^an insert pauses in -the natural -language input which can 
adversely affect the -recognition of the natural language 
input. 

~In .particular, in "speech recognition which use 
-context ^ vf ree grammars if a user inserts pauses bther than 
at -the -places expected by a speech recognition engine 
,e^g. at i *the end 6€ a sentence, the -resultant speech 
recognition accuracy can be adversely affected. 

^There are many -reasons why a speaker may insert 
pauses during- speech input, -e.g. when emphasising words 
where -the pauses are not .properly interpreted by -the 
speech -recogniser. ^Pauses may also occur in the -speech 
input where actions are involved. One particular area in 
which this occurs is in the field of multimodal data 
input. ' " v , 

In order -to increase the richness with which a user 
can interact with a machine, it has become common for the 
user to be able to interact the machine using more than 
one type of input device, i.e. more than one modality. 



Far example, it is common in speech -recognition systems 
used on general purpose computers ^to allow a user to 
input data using a speech -recognition ^engine , and^to 
supplement the input of -speech data with mouse data and 
keyboard data. Multimodal systems combine input 
modalities such as touch , pen , speech and gesture to 
allow more natural and ^powerful communication than any 
single modality would alone . 

When one of *the modalities comprises a -channel by 
which natural language can be input, in view of the 
interaction toy a user with more ^.than one modality at -the 
same time*- the i inputting of data using a-second modality 
can atf feet- the inputting of data using natural language 
i.e. when a user rs inputting. data in a second modality, 
-this can -cause a delay in the input of niatural language. 
^For example , when a user uses a multimodal system^f or 
inputting speech and mouse events, the user may .pause 
during ^speech; in order ^to properly ; locate the pointer 
controlled toy ^the mouse in order -to generate the mouse 
event . This , pause in =the natural language input can in 
-some instances cause errors in the f recognition of the 
natural language input. The reason for *this is that some 
speech recognition systems use context free grammars for 
the recognition process. A context free grammar defines 
a whole utterance (i.e. a portion of speech between 



.pauses ) * Thus a pause appearing in the middle of what. 

the recognition engine ^expects to be an utterance causes 

the -recognition engine ^to -treat the input speech as 'two 
shorter utterances. -The -recognition engine will ^thus -try 
to match the -two utterances separately ^to ^the grammar 

•rules. -This causes misi?ecognition. 

It is therefore an object of the .present indention 

■to overcome -this limi tat ion in -the -prior art. 

£n accordance with -the first aspect of the present 
invention ^t here is provided a data processing apparatus 

-for generating a modified data structure which defines 
modified grammar -ruies for -recognition of a natural 
Language input with pauses. A data structure defining 
grammar *rules ^or recognition of a natural language is 

-received and analysed 4:o identify positions in the 
grammar ^rules at which pauses can occur in the natural 
language input. This is -then used to generate the 
modified data structure. 

In accordance with -the present invention, tliere are 
many different pause criteria which can be used ^for the 
identification 6€ , pauses in "the natural language input. 

- The criteria can take into "account the behaviour of an 
individual user, or whether or not other inputs are used. 

The modified data structure can be generated simply 
by adding a form of marker or tag to the data structure 



to identify positions in the grammar rules at which 
pauses can occur in —the natural language ; input. 
Alternatively or in addition, the grammar rules can be 
fragmented in accordance with the identified positions -to 
generate sub grammar ruies . The ~sub grammar rules can be 
arranged hierarchically ^to form < the modified data 
structure. - 

-The modified data structure , because it contains 
information^to allow for pauses , can then be used-f or the 
^recognition of the natural language input which includes 
pauses -to -theaseby^provide f or more accurate recognition. 

-In the present invention 4;he natural language input 
<:an comprise any form of natural language for 
-communicating between people. This not only includes the 
conventional natural languages e.g. English, -French, 
^cman etc . , but also includes other natural languages 
such as s ign language -for which recognition can depend 
upon 4:he ^ of natural language units 

e.g. words and grammar rules are applied for recognition. 

The analysis performs a^prediction^to identify where 
-pauses may be inserted in -the natural language input . 

This ^enables the recognition of the : natural language 
• * either with or without the pauses i .e . it allows a \iser 
to be relaxed about pausing during the input of natural 
language. \ .■ v ) 



The present: invention is .particularly suited for use 
with speech recognition as the -first modality input. 
Certain speech -recognition processes use grammar -rules 
e.g. context <fcee grammar rules -for the -recognition 
-5 process. Unexpected .pauses within the input speech can 

cause a -reduction of -recognition accuracy and thus -the 
present invention can be used -for generating speech 
Recognition modified grammars which take into account 
: pauses within the speech. 
10 .J3!he4>cesent invention is also particularly suited to 

multimodal -input systems in which -the first modality is 
a natural language input and a second modality comprises 
associated eyehts e.g. mouse clicks or gestures. "In 
order to ^recognise -such multimodal input, the multimodal 
15 grammar in an -embodiment of -the •/ invention defines 

... multimodal grammar rules by defining grammar-rules -f or 
the -recognition of a natural language in conjunction with 
associated events in one or more further modalities. «In 
-such a system, ^events in a further modality can affect 
20 Uthe timing of the input natural language and thus the 
analysis to identify where pauses *rcan occur in ^ the 
natural language can be achieved based on events in 
rfurther modalities. . ':■'/:/:■;. * r ■* 

In order to enable data ; to be • input into a 
25 multimodal system a modified multimodal data structure 



defining modified multimodal grammar .rules is preferably 
generated in addition to the modified data structure 
defining modified grammar . In the modified multimodal 
grammar rules the -relationships between events in the or 

-each further ; modality is defined in relation *to -the 
modified grammar -rules . 

The present inyention alrso providers an apparatus and 
method^or generating data in a computer usable -form 
using the data structure. ^Che modified data structure is 
used in conjunction with a natural language input for the 

^recognition of -the natural language input. An example of 
such a system is a /speech -recognition engine which 
utii-ises *4:he modified grammar rules in order. 4:o perform 
a speech recognition ^process . 

^ie- present invention also ^provides an apparatus and 
method ^for generating data in a >j computer usable form from 
a multimodal input. Hecognised natural language data is 
input -together >-with eyehts -for one or more ^further 
modalities. Also the multimodal modified data structure 
is used which .def ines ^-t he -relationship between -the 
-modif ied grammar rules and ;>the events in : the or ^ach 
further modality. An analysis is carried out to 
determine if the first ^mp^lity input data and the or 
each further modality input - data match any modified 
grammar rule and comply with any related ^vjents in the or 
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each -further modality. If a match is found, computer 
usable data can be generated in dependence upon -the 
match. 

The -present invention can be embodied as a specific 
hardware system, or as software - implemented by a 
processing system. —Thus the present invention can be 
embodied as .processor implementable instructions for 
controlling a .processor e.g. a computer ^program. -Such 
instructions can -be -provided in physical fo*nn *fco a 
processing system on a carrier medium e.g. floppy disk, 
*€DR0M, magnetic -tape, any other programmable medium, or 
any -form of car-rier -signal such as a signal carried over 
a -computer network such as-t-he Internet. 
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Embodiments of -the present invention will now be 
described with -ref erence^to the accompanying drawings , in 
which: ; >- v - 

-Figure 1 is a schematic diagram of a generalised 
.embodiment of the present invention; , 

Figure i is a ^schematic diagram of a second 
^embodiment^to tlie present invention; y : v 

7 -Figure 3 is a schematic diagram of a general purpose 
computer ; for implementing the>second embodiment of the 
present invention; \ i ? - 



Figure 4 is a diagram illustrating a multimodal 
input; 

-Figure 5 is a "f low diagram illustrating the method 
of generating the modified data structure in accordance 
with a second embodiment of -the present invention; 

-Figure *6 is a diagram illustrating the marking of 
pauses within the input data structure; _ . 

-Figure 7a and 7b is a diagram illustrating \*the 
generation of ~4:he sub grammar rules; 

--Figure 8a and 8b illustrate the generation of the 
modified data structure defining the modified grammar 
-rules ; ■ •• L 

^Figure 9 is a flow diagram illustrating tiie 
generation of the modified multimodal data structure; 

Figure 10 is a diagram illustrating the input 
multimodal data structure; v 

Figure 11 is a diagram illustrating 4:he relationship 
between the sub grammar rules and the second modality 

-events; . -v '- u ■ - ' " y - : : ■% •' '. 

figure r .13 is a diagram illustrating the modified 
multimodal data structure defining the modified 
multimodal grammar rules; : *•% 

Figure 13a and 13b are a flow diagram illustrating 
the generation of computer usable data from a multimodal 



Figure 14 is a schematic diagram of a third 
.embodiment of the present invention; and 

-Figure 15 is a flow diagram illustrating : : the 
operation of the ^third embodiment of the present 
invention in generating computer usable data in the -form 
of units in the natural language i.e. words. 

-The generalised -embodiment of "the prensent 
invention will now -be -described with reference -to 
>Figure 1. 

A ^processing unit 3 in a ^processing system is 
arranged *to -respond ^to a predetermined selection of 
multimodal inputs . The multimpdal inputs are defined by 

^ules '€btiaing a multimodal data structure . The ^rules 
employed -for input to the processing unit are multimodal 
and are defined in -terms of predetermined sequences of 
words in combination with associated -second modality 

, eyents . 

An input processor 2 is provided *to receive the 
multimodal inputs by ^the user. The input ^processor 2 
attempts to match or fit the multimodal inputs -to a rule 
*to which the processing unit ^responds . If -the multimodal 
input successfully matches a multimodal rule , -the 
processing unit responds to the input in a manner 
dependent upon which rules it satisfies. 
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The multimodal input may comprise an instruction, a 
response -to a prompt, a message, or a question. Thus the 
processing unit can *respond,4:o multimodal input which can 
comprise input data-* or processing by- the processing unit 
and/or input instructions for causing the processing unit 
-to .perform a function • 

-In -the illustrated embodiment, a user employs two 
-separate modalities -for input into -the processing system, 
one of which is speech. The first and second modalities 
are interrelated such^that events in the second modality 
depend upon events in the first modality. 

Since one 6f the input modalities is -speech, the 
input , processor -2 makes use of a conventional speech 
recognition (SR) engine which analyses -the input speech 
^signal *O wprovide a result comprising -the words it has 
■recognised. Xn oasder^touperform^the spjeech^-recognition , 
tlie speech ^recognition engine utilises grammar -rules in 
the^form a data structure in order to .perform -the speech 
.recognition process * - The grammar -rules define a grammar 
Lje vg . context f ree grammar of , the natural language 
-employed -by a speaker providing the speech input which 
corresponds to the words within the multimodal rules 
employed for input to the processing unit* ■ 
Since, in this embodiment, the use of the second 
modality input causes the user to punctuate ^the speech 
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input with pauses, the use of conventional grammar rules 
for the speech r^pognit ion engine can ^reduce the 
performance of the speech recognition engine. The use of 
the -second modality input in conjunction with speech 

- input can cause a user to vary his delivery of the words 
-so that^ the users natural speech flow can be affected. 
—Thus grammar -ruies extracted simply by extracting- the 

words -from within the multimodal ^rule employed *f or input 

- to * the .processing unit will not take into account the 
effect of -the ^second modality. Thus in the present 
embodiment -the SR grammar -rules -to be used by the "SR 
engine are obtained -by a modification of the SR grammar 
rules within* the multimodal data structure . 

A data - structure preprocessor 1 is provided to 
--receive the multimodal data structure and generate a 
; modified data structure defining modified grammar rules 
"for the ^speech ^recognition engine and a modified 
multimodal data -structure -for use in *the interpretation 
; of the multimodal- input . — ~ ^ — ... 

In order vto generate the modified data structure 
^defining the modified grammar -ruies r -the data structure 
.preprocessor 1 analyses the multimodal ^ruies defined by 
the multimodal ciata- structure to determine the positions 
between spoken words where the speaker is expected to 
pause due to carrying out an action -related to his need 
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to input using the second modality. The data structure 
postprocessor 1 fragments each grammar rule in the 
multimodal rules on the basis of the pauses -to -form 
grammar sub rules. The content of *the grammar sub rules 
plus their mutual relationships are used by the data 
-structure preprocessor 1 -4:o Sorm -the modif ied grammar 

-rules defined by the modified data ^structure. In this 
way the modified grammar rules defined by the modified 
data structure accommodate the pauses influenced by the 
second modality input* 

Thus the SR engine utilises the modified grammar 
cules provided by >4:he modif ied data structure in order to 
generate a ^ ^recognition -result comprising a staring of 
-recognised words. The multimodal -rules "for input to the 

-processing unit comprise a combination of such words and 
associated second modality events. The modified 
multimodal data structure is generated by the data 
structure preprocessor 1 in order to provide an improved 
way of identifying „^propriate outputs "from-the SR-engine 
in -the -light of the ^combination with ^recorded second 

--modality events . -Hence more; accurate identification b"f 
a multimodal grammar rule can be achieved. Within the 
data structure preprocessor 1 , the modified multimodal 
data structure is formed by integrating second modality 
events at the newly formed grammar sub *rule level of the 
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modified data structure. Thus the data structure 
preprocessor determines what multimodal events should be 
associated with each grammar sub rule within a modified 
-grammar -rule. The data structure preprocessor 1 "forms 
the modified multimodal data structure from the 
determined associations . 

-The output of the speech recognition engine and the 
input second modality events ace compared within the 
input processor 2 with -the association specif ied in the 
modified multimodal --data structure r to determine which 
multimodal rule is matched by 4Jte multimodal input. -If 
a multimodal ---rule rs matched by a multimodal input, an 
appropriate input is generated to the process ing unit 3 
in accordance with * the multimodal ^arule matched. 

-In ^ -this embodiment, -the ^processing system can 
comprise any processing -^system which requires a 
multimodal interface to implement . .process within a 
^processing unit 3. *The Afunctional units can be 
implemented either partly or wholly in software either on 
a special purpose machine or a general purpdfce computer. 

A more detailed embodiment of the present invention 
will now be described with reference to Figures ~2 to 13. 

. Figure 2 is a functional diagram illustrating a 
computer system allowing speech input as a first modality 
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and mouse events as a second modality. The input is used 
—to input data and/or instructions to a .processing 
application 30. 

A database of multimodal data structures 40 is 
.provided -«to store a plurality of multimodal rules for 
defining inputs to the .processing application 30. Also 
a database of pause criteria 50 is provided to store 
criteria identifying how the second modality events i.e. 
the mouse clicks can affect the timing of the speech 
pattern by a user. 

This embodiment -to the present invention is provided 
with a data structure preprocessor 10 and an input 
processor -20 which operate in a s imilar manner *to the 
data structure preprocessor 1 and -the input processor 2 
of the , previous ly described embodiment * 

The data structure preprocessor 10 includes a pause 
processor 100 which receives multimodal rules defined by 
multimodal data structures ..." from *the databases of 
multimodal (MM) data structures 40 . The pause processor 
100 ^processes the multimodal rule defined by -the 
multimodal data structure in accordance with -the pause 
.criteria read from the database of pause criteria 50. 
The pause processor 100 inputs markers into the grammar 
rules to identify the position of pauses. A modified 
data structure former 101 receives the grammar rules with 
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markers and fragments the marked grammar rules using the 
markers in order £o-*£orm sub-grammar rules . -The sub- 
grammar erules are "then hierarchically arranged in 
dependence upon their mutual -relationships in order -to 
^orm modified grammar rules defined by a modified data 
structure . 

A modified data structure store 102 is provided to 
store the modif ied data structure . This can then be made 
available-to a speech recognizer 200 as will be described 
in more detail hereinafter. 

The data structure preprocessor 10 also includes a 
modified multimodal data structure -former 103 which 
receives the input multimodal data structures and reads 
-the modified data sjbructure store 102 . The modified 
multimodal data structure --former 103 determines how -the 
mouse events should -be associated with each grammar sub- 
>*rule wi-thin *the modif died grammar rule of the modified 
data structure. The modified multimodal data structure 
^former 103 forms a modif ied multimodal data -structure in 
accordance wi-th-the determined associations . A modified 
multimodal data structure store 104 is .provided within 
the data structure preprocessor 10 ^f or storing the formed 
modified multimodal data structure. 

Thus the data structure -preprocessor 10 generates 
modified data structures and modified multimodal data 
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structures as described hereinabove with -reference-to 4:he 
previous .embodiment. These are used by the input 
-processor 20 in order *to derive an input --for 4:he 
-processing application 30. 

This embodiment is ^provided with a ^speech input 
dev&ce ^&0 in order to generating a speech signal which is 
input to a speech recogn is er 200* The speech ^ecogniser 
ZOX) carries out •recognition using the modified data 
^structure -read from *the modified data structure store 102 
.provided in the data structure ^preprocessor 10. The 
output of the r speech -recogn±ser-20 0 comprises a sequence 
of -recognised words which are input to a comparator 201. 
Also input into -the comparator 201 are mouse .events 
generated by a mouse 70. The comparator compares the 
multimodal input with - the modified multimodal data 
structures read from ^the modified multimodal data store 
104 in the data structure preprocessor 10. -In dependence 
upon the matching of the input multimodal data with the 
modified ^multimodal data structure -that input is 
generated -for t*ie processing application 30. 

This embodiment of the present invention is 
implemented using a general purpose computer and the 
\ functional units comprise software modules implemented by 
a processor. Figure 3 illustrates the structure of the 
general purpose computer in more detail. 
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The computer comprises a .processor 59 -for 
implementing .program code stored in the ^program storage 
51. When -the ^processor 59 implements the program -code , 
the data which is stored in <the data storage 52 is 
processed. The computer is also provided with 
conventional -random access memory (HAM) 53 for use as 
working memory by the -processor 59 . A keyboard 54 is also 
provided -for conventional keyboard input. A display 57 
is provided -*Cor ^EMToviding -4:he user with a visual output. 
An audio input device 56 -is provided -to enable a user^o 
input speech as the -first modality input. A mouse 57 is 
^provided as ^the - second modality input device. The 
components of ^he computer are linked by a,, control and 
data bus 58. The ^processor implements a pause processor 
59a -by implementing pause processor code read "from 1 the 
program storage 51. The -processor aiso implements a 
modified data structure -former 59b toy implementing -the 
modified data structure ^former ^code provided in -the 
-program storage 51. The .processor further implements a 
modified multimodal ' data structure ^former 59c by 
implementing *the modified multimodal data structure 
former code stored in -the program storage 51. The 
processor also implements a speech recognition engine 59d 
by implementing the speech recognition engine code stored 
in the program storage 51. Further , the processor 59 
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implements a comparator 59e by implementing the 
comparator code stored in the program storage 51. Also, 
the processor 59 implements a processing application 59-f 
by implementing the processing application code stored in 
5 jthe -program storage 51. 

-The data storage and -program storage "can comprise 
any -suitable storage device such as non volatile memory 
,e.<f. Sloppy disk, hard disk, programmable read only 
memory devices, or optical disks, or volatile memory e.g. 
10 RAM. 

«£t can thus be 'seen .. from this embodiment of the 
present -invention *that *the present inyention can be 
implemented by supplying computer code to a general 
purpose computer to implement the functions* A computer 
15 program can be supplied by providing the computer .program 

-on any carrier medium such as a stora^ medium e.g. 
floppy disk, optical disk, magnetic tape etc . or as a 
signal e.g. a signal transmitted over a network such as 
the -Internet. 

20 -The method of operation cff the data structure 

processor 10 will now be described in more detail with 
^refecehce to Figures 4 4:o 11. 

-This embodiment to the present invention will be 
described with reference to use with a facsimile receipt 

25 and transmission processor application. 
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The multimodal rules are framed in a format which is 
an extension of 4:he Java "Speech Grammar format ( JSGF) . 
-The JSGF accompanies the Java speech API (Application 
-Program -Interface) as a platform independent method -for 
Java programmers -to use conventional speech -recognition 
.engines in Java programs. Version 1.0 of the JSGF 
was -released by Sun on 26 October 1998. Under JSGF each 
-rule is specified by naming it inside angular brackets 
(< >) followed *by an equal sign ( = ) and a rule 
definition. -The -rule definition is in terms of -tokens, 
where a^token is a word which can be spoken or a ^sequence 
of words wit* a single -combined meaning e.g. "New York 
City". The JSGF is extended in the present embodiment to 
accommodate the second modality i.e. mouse click events. 
-The mouse click events are -treated "as ^tokens and -the 
-click is considered -to be the information content of the 
-*token. In order for a processor to recognise which 
modality t:his ^token comes from, -the token consisting of 
"click" is ^preceded wi^th an exclamation mark ( i ) which 
itself is preceded by the modality i.e. "mouse", giving 
overall "mouse ! click" . 1-f two separate mouse channels 
were employed, then the mouse modalities *could be 
separately identified as "mouse 1 rclick" and 
"mouse2 ! click" respectively. When no modality is 



20 

specified before a token, that token is .considered to 
.comprise the -speech modality (first modality). 

order -to specify how -the spoken words are 
coordinated with mouse clicked eyents, the "anjpersand 
^symbol" (&) symbol is used -to indicate -that a mouse click 
event is associated with a particular word or token. By 
way of an example, in the present -embodiment one 
multimodal rule of rthe application multimodal data 
structure is defined as follows: 

<fax rule>= 

fax (this & mouse ! click ) vto (him & mouse! click) 

-Accordingly, ^or-this rule «to -be satisf ied, the word 
"fax" must be -received via the speech modality, then the 
word "this" must be -received by .*fc he .speech modality in 
association with a mouse click event via the second 
modality, then the word ^-to" must be -received via the 
speech; modality and finally the word "him" must be 
♦received by -the speech modality in association with a 
mouse click eyent via *the second modality. 

In Ahe present embodiment, a mouse click event is 
defined as being associated with a given word if it 
occurs at any time after the end of the word preceding 



the given word and before the start of the word following 
-the given word. This is shown schematically in Figure 4* 
The pause criteria used in this embodiment and 
stored in~*the database of pause criteria 50 can comprise 
a general set of -rules which are applicable for 
generating inputs for any^type of processing application. 
Alternatively , 4:hey >^an be adapted "for ruse with : : a 
-particular application and -corresponding multimodal data 
structures, ^n -the ^present example the pause criteria 
are -relatively simply defined and hence can be applied to 
a -range of applied criteria consist of 

just one -basic --rule - that is applied uniformly to each 
rule of the multimodal data structure. The basic pause 
-rule is -that a s ingle .pause is possible in relation -to 
one or more words of the multimodal rule , provided one or 
more mouse clicks are associated in accordance with the 
■above described bounded celation-to any such words. The 
single pause is accommodated either directly before or 
directly af.ter— the corresponding words, but there is no 
^possibility accepted of a pause occurring both directly 
^before and directly after *the words. 

^Phus in r accordance with this —specific 
implementation - of ** the present . invention ; -the ; data 
; structure preprocessor 10 implements the pause processor 
100/ the modified data structure former 101 , and ^the 
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modified multimodal data structure 103 as Java code 
modules. This enables the speech *recogn±ser 200 to 
interface to the grammar -rules stored in «the modified 
data struct ore store 102 using JSGF. 

The method of implementation of the pause -processor 
1*00 and -the modified data structure former 101 will now 
-be described with -reference to Figures 5 -to 8. 

ten step Sl^the multimodal data structure defining MM 
grammar rules are received which comprise speech 
-recognition grammar rules and associated mouse events . 
The multimodal data structure is read from the database 
of multimodal data structures 40. 

-In v step S 2 pause criteria are Received from the 
database of -pause criteria -50 . , €n step S3 the pause 
criteria are applied *to -the multimodal grammar rule of 
the multimodal data structure ^to establish positions of 
any pauses of -the words of the mult imodal grammar -rules . 
Assuming -that jthe multimodal grammar *rule comprises the 
-fax scule mentioned, hereinabove and as illustrated in 
-Figure 4 , the pause processor 100 analyses the words of 
a ^fax ^rule -to -locate any words -for which multimodal 
.-events are associated. The word "this" is identified and 
also the mouse clicks associated - therewith in -the 
multimodal rule. Thus the pause processor 100 

establishes that, in addition -to no pause occurring in 
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relation ^to the word "this", a pause may be located 
directly before the word "^this " or directly a*ter ^the 
word "this", giving**three separate possibilities . The 
word "him" is also identified, along with the associated 
mouse click event. Thus the pause processor 100 
.establishes that, in addition 4:o no pause occurring in 
-relation <the word "him" a pause may be located 

directly before the words "him" or directly after the 
word "him", again giving three separately possibilities* 
In step S4 -the pause processor 100 extracts the 
complete chain of words contained in the multimodal rule 
-to ^form a> data structure which is ^equivalent to a 
^-conventional grammar rule usable by an SR engine* "In step 
-S5 the^pause processor 100 -then marks the pauses between 
the words at the established positions in the data 
-structure. Three such possibilities were -established due 
-to -the word '•■this " and three such possibilities were 
..established due *to the word "him" . Since each 
possibility for ".this" may arise with each ^possibility 
for "him", a combination of nine .possible arrangements of 
pause marker positions -relative to the word order of the 
r rule arises. Such arrangements are hereinafter -referred 
-to as marker configurations, and -the nine versions in-the 
present example are shown as items 141 to 149 of Figure 
6, in which the marked pauses are . identified by -the 



nomenclature <silence>. Thus in this way a data 
structure is established which comprises a plurality of 
strings of words with .pause markers. This is input into 
the modified data structure -former 101. 

-In step S<> -the modified data structure former 101 
generates fragmented -speech recogniser grammar rules 
according to -the marker positions for any pauses to form 
speech recogniser grammar sub rules, Por the fax rule 
each of *the marker configurations 141 >to 149 are 
processed. 'Fragmea place at the pause marker 

positions. -Figure 7a shows -the different speech 
recogniser grammar sub rules formed -from -respective 
marker ^configurations 141 -to 149. -Marker configuration 
141 ^contains no .pause markers, hence no -fragmentation 
occurs, hence -the -resulting -SR grammar sub-rule is merely 
<the same as ^the initial >»rule ,. i.e. <f ax -this -to him> . 
Marker configuration 142 has a pause marker at the end of 
the phrase, hence ^fragmentation in this case again leads 
*to a "SR grammar sub-rule the same as the original rule, 
namely <fax~this*to hirro. -In marker configuration 143, 
a , pause is marker between -the words '^to • and ' him' . in 
-this case, since new fragments must be formed in respect 
of thi^ marked pause, two fragments are -formed. : The 
first fragment consists of the words * fax this -to', and 
the second fragment consists of the word • him* , which 



fragments form SR grammar the sub-rules <fax this to> and 
<him>. -In marker configuration 144, a pause is marked 
between -the words -*M:his ' and ■ to ' • ^Consequently , -two 
"further new ~SR grammar sub-ru*les are formed from the 
fragments either side of the pause marker , namely <fax 
this> .and <to him>. Marker -configuration 145 is similar 
to marker configuration 144 , but has a ^further pause 
marked at -the -end of <the initial grammar SR rule , after 
-the word , him f . -This in -fact -produces no -extra f ragments 
compared to marker configuration 144 , consequently the 
two SR grammar ^sub^iniies ^prodxiced by ^fragmenting marker 
-configuration 1 45 are in ac t the ~s ame as -for 144. In 
marker conf iguration 146 , there is one ^pause marked 
between ^the words '^this' and 'to', and a further pause 
marker -between , the words -•-to' and 'him'. -The fragments 
-that ^result ^.provide —the SR grammar sub-^-rules of <fax 
-vthis> f <to> and <him>. Note that the SR -grammar -sub-rule 
<f ax this> was also .produced -from marker configurations 
144 and 145 , and^the^SR grammar sub-rule <him> was also 
produced, from marker configuration 143 , but -the SR 
grammar sub-rule <to> is a new SR grammar -sub-rule which 
was not produced by any of marker configurations 141 to 
145* Marker -configuration 147 has just one pause which 
is marked between the words ■ fax • and 'this • . 
Fragmentation here results in two new SR grammar 
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sub-rules, namely <fax> and <this to him>. Marker 
.•configuration 148 is similar -to 147 but has an additional 
; ^>ause marked a&ter -the word 'him' . This however produces 
the same fragments as described -for marker configuration 
147. Finally marker -configuration 149 includes a pause 
marker between the words '«fax' and 'this', and a pause 
maarked between the words "^to' and 'him' . " This .provides 
-three -fragments ^providing -three SR grammar sub-rules 
which are <fax>, <this to> and <him>. Of these three "SR 
grammar -sub-rules both <fax> and <him> are replications 
SR grammar sub-rules -produced -by previous marker 
configurations 141-to 148 , whereas <this to> is a further 
new SR grammar -sub-rule . 

-The above described process produces a number of SR 
grammar sub-rules. -m -the .vpre'sent case nine such 
different SR grammar sub-rules have been ;prpdueed, and 
these are shown in Figure 7b-. : ; It is important to note 
that the above described procedure does not merely 
-represent >eaclu combination of dividing the f our words 
contained in the phrase '~f ax -this to him'. That .process 
• would instead have produced a further -possibility of the 
word 'this', which does not appear in the . SR grammar 
sub-rules shown in Figure 7b. , It is to be appreciated 
that more complicated standard SR grammar -rules will 
typically produce a significantly smaller number of SR 
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^grammar sub-rules ^relative to the total number of 
permutations of words. 

The above .processes are -repeated -for each multimodal 
rule used -for generating an input *f or the .processing 
application 30. 

At step S7, -the modified data structure -former 101 
"forms a modified SR grammar rule defined by a modified 
data structure ^from the SR grammar sub-rules . This 
^process comprises implementing 4:he -logical relationship 
^between -the different *SR grammar sub-rules , i . e . in the 
case of -the present -*fax ru£e, ^the relative sequences as 
indicated on -the -right hand side of Figure 7a. This is 
-further illustrated -for the case of the present -fax rule 
-in Figures 8 a and 8b, where Figure 8 a repeats the content 
of ^the^right hand side of Figure 7a except 4:hat identical 
outcomes from different marker configurations are not 
duplicated , and ^secondly ^the sub-rules are presented in 
*ttoeir -labelled -form, e.g. <SR sub-rule 1> -rather than 
<f ax this -to him> a vertical line represents -tfee "or" 
symbol. Figure 8b represents the content of "Figure 8a 
except >that further use is made of the '?or w ^symbol 
consisting of a vertical line. The above described 
.process is repeated for .each multimodal rule for 
generating an input for the processing application <to 
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form respective modified SR grammar rules defined by the 
modified data structure* 

At step S8, the modified 25 data structure is input 
into "the modified data structure store 102. 

Details of the .processing steps involved in forming 
the modified multimodal data structure will now be 
described with reference to the flow diagram of Figure 9 . 

-In -step S10, modified data structure store 102 is 
accessed and the modified data structure is input into 
-the modified multimodal data structure former 103. 

In ^step-Sll f the multimodal data structure database 
40 is accessed and the multimodal data structure is also 
input into -the modified multimodal data structure ^former 
-103. 

In step SI 2, the modified multimodal data structure 
-former 103 determines a set of modality interdependency 
rules by analysing the relationship specif ied between the 
^two modalities in the multimodal data structure. for 
example, in the -case of the fax rule described above, -the 
modif ied multimodal data structure -former 103 determines 
rthat one mouse click event is required in association 
with each of the two words /'this" ami "him". 

As mentioned earlier, under the bounded 
relationship, the timing of the mouse click associated 
with a given word is specified to be at any time after 



-the end of the word preceding the given word and before 

*the start of the word following the gi^en word. 
Referring now to the .pause positions employed inf orming. 

-the modified speech recogniser data structure , it is to 
be appreciated that the timing definition of the mouse 
clicks results in those mouse clicks also being allowed 
to take .place during the pauses associated with the 

-words , as shown schematically in Figure 10, where mouse 
click configurations 171 to 179 show -the mouse click 

-timing -relationship as applied to marker -configurations 

.141 to 149 ^respectively . 

deferring again to Figure 9 , at step S 1 3 , the 
modified multimodal data structure -former 103 associates 
-the second modality events i.e. mouse clicks as specified 

4>y -the interdependency -rules with the SR grammar 
sub-rules , -thereby forming a multi-modal sub-rule in 
correspondence with, each SR^ sub-rule. 

-In^the case 6f the -fax rule, for example, the first 

-sub-rule is initially selected , i . e . SR : subr-rule 1 . At 

-step S 13 it is determined whether the modality 
interdependency rules define any 'association of a mouse 

-click event with the words 'fax this to him' of 
sub-rule 1 . Referring to Figure 4 it can be seen that 
two separate mouse clicks are indeed -required with this 
SR grammar sub-rule, hence two mouse clicks are 
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associated with selected SR sub-rule 1, as shown by item 
191 of -Figure 11. Next, SR sub— rule * 2 is selected. SR 
sub-rule 2 contains -4;he words 'fax -this -to' , hence 

* referr ing to figure 4 it can be seen that only one mouse 
click, corresponding '*to -the word '-this', is specified ^to 
take place in association with SR sub-rule 2. 
Consequently at step SI 3 only one mouse click is 
associated with SR sub-^rule 3, as shown by item 192 of 
Figure 11. The process is -repeated far each grammar 

* sub^rule ^from SR^sub-rule 1 -to SR sub-rule 9. SR sub- 
*rule 3 contains only «the word 'him', for which one mouse 

click only is required, hefice providing -the association 
shown as item 193 in Figure 11. SR sub-rule 4 contains 
the words "fax this " , hence one mouse click is required 
due tto the word 1 this • , ^resulting in association with one 
mouse click as shown by item 194 in Figure 11. 

^Similarly , 1 SR sub— rul« "5 results in association with one 
click as shown as item 195 in -Figure 11, -this being 
derived €rom -the -word 'him' . However, SR sub-rule 6 

^contains only -the word -'-to'- which does not have any mouse 
click -specified therewith ( see Figure 4 ) . Consequently 
as shown by item 19* of Figure 11, no association with a 
mouse click is allocated by the modif ied multimodal data 
structure former 103 to SR sub-rule ^6. The only word 
contained by SR sub-rule 7 is the word ' fax • which also 
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has no mouse clicks therewith* SR sub-rule 8 contains 
the words 'this^to him' , hence it has two mouse clicks 
associated therewith. SR sub—rule 9 contains the words 
• this 4x>' , hence the mouse -click associated with*the word 
'this' is associated with SR sub-rule 9. The above 
process is -repeated for each rule of -the modified sub- 

-rule data structure. 

c In step S14 r --the modified multimodal data structure 

-former 103 ^orios a modified multimodal data structure 
comprising the SR sub-rule -2 and the second modality 
(i**e. mouse click) event associations derived ^f or all the 

-rules. At step S 15, the modified multimodal data 
structure is input -to the modified multimodal data 
structure store 104 . The -lEorm of -the modified multimodal 
data structure -is illustrated in Figure 12. 

-Thus -'the data structure preprocessor 10 generates 

*both a modified data structure holding modified grammar 
rules -for use by the speech *recogri±ser *200 and also a 
modified multimodal data structure -for use. in -the 
analysis 6£ -the multimodal input by the comparator 210 in 
order to generate an input -for the .processing application 
30- 

Details of the processing steps carried out by the 
comparator 42 during the operation of the input processor 



32 

2 will now be described with reference to the flow 
diagram of : Figures 13a and 13b ♦ 

In step S20, 4:he grammar sub -rules in the modified 
data structure are loaded into the speech recogniser 
(SR) -200. In step S21 a grammar rule counter n is set-to 
1. The grammar -rule -counter determines the .position of 
a sub grammar ^rule in a sequence for computing -the 
modified SR grammar *rule . ^Figure 8b shows the modified SR 
-fax^ruie organised into ^Eour alternatives where each 
alternative has a unique ^SR grammar sub-rule at its 
logical start (n=l ) • m -the case -of -the first 
alternative , £ or recognition of ^the whole modified SR fax 
rule -to -take place, <SR sub-rule 1> needs to be 

-recognised. In the case of the -second alternative, <SR 
sub-rule -2> needs ;-*to be recognised following which <3R 

-sub-rule 3> needs to be recognised. —This is -represented 
i>y the ^second line in the equation -form of ^Figure 8b. In 

-the case of the third alternative, <SR sub-rule 4> needs 

-to be recognised followed by the indicated variations 
with -respect to <SR sub^rule 5>, -<SR sub~rule 6> and <SR 

^sub^rule 3> being recognised* This is represented by 
the third line of the .equation form of Figure 8b. In the 
case of -the fourth and last alternative of the present; 
example, <SR sub-rule 7> needs to be recognised , followed 
by the indicated combinations of <SR sub-rule 8>, <SR 



33 

sub-rule 9> and <SR sub-rule 3> being recognised. This 
is represented by the -fourth and final line of *the 
^equation form off Figure 8b. Thus in the example shown in 
^Figure 8b , there are -four first SR grammar sub-rules 
(n=l), namely SR sub -rules 1, 2, 4 and 7. One of these 
will b& given toy -the SR as -the best match in step S22 . 

3»ie procedure will now be described -for the case 
when, -for -the modified^SR -fax rule, - the best match giyan 
-for a starting ^rule is SR sub~rule 2, containing -the 
words 'fax -this ^o' . At step S23, comparator 201 
determines any associated second modality events required 
^or -that ^SR grammar -sub-rule. -The comparator 201 
determines, ^rom-the modified multimodal data structure 
it has*-rieceived, -the -requirement -that ^f or SR -sub-rule 2 
one mouse click is^equiced-to have been input during the 
^orr^ponding - time abortion 6f the audio input -that 
provided -the -recognition -result. 

At step S24 -the actual mouse click inputs by the 
operator using mouse 70 are analysed by -the -comparator 
-2 01 -to determine which, if any, of said, events occurred 
in the time period -corresponding to the ^relevant audio 
input. At step -S25, the comparator 201 determines 
whether -the above described required mouse click .events 
are consistent with the actual received events. If *t hey 
are not consistent, then the comparator 201 allows a time 
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out period *to allow for input not yet properly processed,- 
or in the^case a mouse click duetto appear at the end of 
a .pause, not yet -received* After -expiration of ^tlxe time- 
,out period, at step S27 the comparator 201 o^ce again 
carries out a determination step as to whether the 
received and -required mouse click inputs are consistent. 

^they are still not consistent then at step. S28 it 
determines whether -there are any ^further candidate 
untried matches ^f or -the initial -SR grammar -sub-rules . -If 
-there are none, in step S3 0 it is determined if v the SR 
grammar -sub-rule being ^processed is t,he ^irst . in -the 
sequence and if so then at step S36 the output ucesult is 
-set as "error"* ^Lf -the SR grammar sub-rule_ being 
•processed is not the first in -the sequence^ in-step S3 1 
a -previous BR grammar sub-rule is -tried and in^stQp S28 
it is once again determined if- all matches have been 
tried. • . • ■ _ 

i-f at step S2 8 -there are further best matches of "SR 
grammar ^sub-rules which have not been tried, - then the 
comparator 201 -receives the next best match -for an n th SR 
grammar sub-rule from -the speech -recogniser -200 in step 
^S29, and -the .process returns to step S23- >: 

•In the above described -procedure, if at step S25 or 
step S27 the received and required second modality events 
were in fact consistent, then at step S32 the n th SR 
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grammar sub-rule -thus processed is stored as a currently 
identified 11 th SR grammar sub-rule. Thus in -the present 
exaniple, SR sub-ruie 2 containing the words '-fax this to' 
is so identified. 

At step S3 3 the comparator 201 determines whether a 
modified SR grammar rule incompleted with -the identified 
n th SR grammar sub-rule. -In the present example of -the 
raodif ied^SR «*£ ax-ruie, had SR sub-rule 1 been identified 
as the correct starting SR grammar sub-rule then indeed 
-the overail modif ied — SR Sax rule would have • been 
.satisfied. >€n this cdse ^the next step would have been 
S40 in which a result is „set as matched to -the 
corresponding *SR grammar rule. However, in the present 
example SR sub-rule ^2 does not -fulfill -the complete 
modified <SR grammar ^rule r and hence the process moves -to 
v step S3 7 in which n is incremented. : The ^prpoess -then 
>~returns -<to step S22 ^to^receive the best match for the 
next SR grammar -sub^rule in the sequence. 

--In >the .present case where -the initial SR grammar 
sub-rule (n=l) is SR sub-rule 2, i€ SR sub-rule 3 is the 
-best match ^or M:he next SR grammar sub-rule ( n=2 ) , at 
•step S23, the comparator 201 determines r from the modified 
multimodal data structure whether any mouse click events 
are required for SR sub-rule 3. It will be determined 
that one mouse click is required during the time .period 
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of the audio input that has been -recognised as SR 
sub-rule 3. 

At step S24, -the comparator 2*01 determines whether 
any such mouse ^click event was indeed received £rom -the 
mouse 70 during -the appropriate time- At step S25, *the 
comparator -201 determines -whether the received and 
^required versions are consistent • If not, then the 
comparator «201 allows a 4:ime-out period to ^eGeive any 
outstanding mouse click events ^to be processed or 
-received , and thereafter at step S26 determines whether 
any *such events have been received during -the time-out 
^period. At step S27 -the comparator 201 determines 
-whether £he updated version of the -received -results is 
now ^consistent with ^the required results . If not , then 
at step S28 it is determined whether -there are further 
untried matches for this next SR grammar sub-rule (n=2 ) • 
If so, in step S29 the next best match for this next SR 
grammar rule is -received f rom the speech recogniser 200 
and -the -process ^returns -to step -~S31. ; Thereafter the 
-process is. repeated as described above. 

If at S 28 -there were instead no more untried matches 
- £or this next SR grammar sub-rule ( n=2 ) available from 
>the speech recogniser 200/ then in terms of the overall 
procedure the currently identified starting SR grammar 



sub-rule is inadequate. 



-Consequently the comparator 201 



attempts *to identify a more suitable preceding SR grammar 
sub-rule by decrementing the counter n (step S31) and 
returning -to step S28, .where it first determines whether 
any untried matches -for -the preceding SR grammar sub^rule 
are available. If they are , then the overall process is 
repeated starting again at step S29, i.e. a next best 
match -for -the, preceding SR grammar sub-rule is ^received 
and the process continued from there. 

I-f at -step S30 no more matches of initial SR grammar 
-sub-ruies are available, in other words all combinations 
of matches of initial SR grammar sub-^rules and 
^consequential ^following SR grammar sub-rules have been 
exhausted/ then at step S3« -the resulrt is set as 
'.error'. —This would mean that no satisfactory speech 
recognition .-result has been achieved that is also 
consistent with the -received mouse click -events . 

Returning now to the processing of -the best match 
-for the next «R grammar sub^rule (n=2), if at either of 
steps S25 or S27 it is determined that the ^receivjed and 
.required ^results are in fact consistent, -then -the next 
step carried out is S32 r in which the next SR grammar 
sub-rule (n=2) whose recognition result has been 
determined as consistent is stored as-^*he. N :pu-rrehtly 
identified 2 nd (n th ) SR grammar sub-rule. 
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At step S33, it is determined whether the currently 
identified initial SR grammar sub-rule followed by -the 
currently identified second SR grammar sub-rule together 

-form a completed modified SR grammar -rule. ^In the 

^present example where the currently identif died initial SR 
grammar sub-rule is SR sub-rule 2 and SR sub-rule 3 his 
since been identified as the next SR grammar sub-rule, 

-then^the whole modified SR ^f ax xule is indeed completed 
since ^SR sub-rule 2 -followed by SR sub-rule 3 represents 

*the second alternative shown in -Figure 8b. -If, however, 
in another example SR sub-rule 4 was identified as the 
currently identified initial "SR grammar sub-rule , and 

-thereafter SR sub-rule *5 was identified as the currently 
identified next SR grammar sub-rule, tiien, as can be seen 
:from -the -third alternative of Figure 8b, the ^result so 
far is -favourable, but nevertheless a further following 
SR grammar sub-rule, namely SR sub-rule 3, is still 
-required to complete the modified SR fax rule consisting 
df SR sub-rule 4 followed toy SR -sub- rule 6 followed by 
'SR sub-rule 3 ♦-In this case -the process would return -to 
step S37 -to increment n and then return *to" step "S22 • 
Then, so long as speech ^recogniser 200 provides a 
^recognition result for SR sub-rule 3 as the best match of 
a third SR grammar sub-rule, the process will -continue 
again from S23 to verify the consistency of that 
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recognition -result with the corresponding mouse click 

event received. 

After the comparator 201 determines, at step S33 r 

that a modified SR grammar ^rule has been completed, at 
step S34 it sets the result as matched to -the 

-corresponding modified SR grammar rule* 

-The inal step is that shown at S35, in which the 

•result r be it a -successful match via step S34 or an 
••-error" message via step S3 6, is used -to generate an input 

-for the processing application 30* The input can be data 

provided by >the multimodal input and/or instructions as 
interpaseted in accordance with the modified multimodal 
grammar rules • 

It can -thus be seen -that this specific embodiment 
provides ^for improved speech -recognition when using a 
multimodal input since-the second modality can be used -to 
improve the speech -recognition -result by predicting when 

.pauses may be inserted in the first modality input whilst 
inputting data using the second modality. The ^search of 
the possible matching sub-grammars is achieved as 
illustrated in -Figures 13a and 13b using a process >to 
search <the "tree structure " of *he sub-grammar rules . 
Branches from initial sub-grammar rules -to next sub- 
grammar rules can be explored and if not successful, a 



new initial grammar can be tried together with the 
branches -therefrom *to next sub-grammar -rules. 

In the above described detailed embodiment , ^the 
pause criteria^consisted of the simple general rule -that 
a single .pause is accommodated .either directly before or 
directly after a word or words provided one or more mouse 
clicks are associated in the bounded relation to the word 
or words . -Even when maintaining such a simple form of 
pause criteria specification, a number of variations are 
possible - One detail in the above embodiment was that it 
was hot possible -for a pause ^to occur both directly 
before -the word and directly a^fter, the word. However, in 
an alternative embodiment, -the pause may indeed occur 
both directly before and directly after -the word. 

-m the above embodiment *the same pause criteria is 
applied- to all of *t he -rules of -the application multimodal 
data structure. Cn other „ embodiments different pause 
-rules can be set -for dif ferent rules of the multimodal 
data structure . -Dif ferent pause rules could be a^ribed 
based on the classification of the type of -rule in the 
multimodal data structure. 

All of the above alternatives broadly ^speaking 
represent a type of automatic pause criteria 
specification in which the -required operations are 
predetermined. In other embodiments , in addition -to or 
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instead of such automatic pause criteria , customised 
.pause criteria can toe input by an operator as required. 
Such input by an operator may be made available either on 
a -rule by rule basis, or as a customised input- that is 
applicable to all of the -rules . Certain pause criteria 
can -be input based on a -resporfse of a user to a ^query, 
where --the response to-the user is in a -format understood 
by the user and which does not specifically detail 
anything about -the pause criteria ^process as such. "For 
.example the application may .present a number 6E queries 
•such as 1 do ^you wish speech processing *to allow long 
-pauses when making mouse clicks? ' . Alternatively , 
combinations of automated and customised .pause criteria 
can be formulated -by the processor using algorithms 
employing historical data .taken whilst monitoring a 
user ' s use of such a system and adapting the pause 
criteria -to -the particular -traits of a particular user. 
^Suoh trait matching could also be achieved by a ^profile 
input -by a user. _ 

the above embodiment, individual words of -the 
natural language , i . e . English , ; >f orm the basis of the 
pause criteria in the view of -the use of mouse clicks to 
identify details related to the spoken words. -In other 
embodiments r key types of word or data blocks -that 
generally are associated with mouse click events could be 
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used. furthermore, when -the second modality is another 
modality other than mouse clicks, this may in itself lead 
to particular types of grammar structure or units being 
the likely causes of pauses. 

£n tiie above .embodiment a bounded -relationship was 
employed to define an association between a multimodal 
event and spoken word. £n other embodiments , different 
definitions -can be specified. One possibility is that 
^the mouse click event must occur whilst -the word is 
actually -being spoken. 

-In -the above -embodiment the entire modified data 
structure is entered into -the speech ^recogniser from the 
modified data structure store 102 prior to the speech 
-recogniser 200 .processing speech input . Alternatively as 
standard "Speech Application ^ogrammer • s lEnteri ace (SAPI) 
is used, enabling just apportion of the modified data 
structure -to be - transferred initially to -the speech 
-recogniser 200 then based on -feedback results #rom the 
speech *recogniser 2O0 -to a \ processor -controlling the 
modified data structure store 102, further parts of the 
modified data structure are transferred -to ^the spjaech 
recogniser 200 as -required* In the latter embodiment, 
for example, -for the case of the SR ^ax -rule only -the 
starting SR graxnmar sub-rules, i.e. sub-ru-les 1, 2, 4 and 
7 are transferred initially to the speech -recogniser. 
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Then, depending upon the .progress of the procedure shown 
in Figures 13a and 13b, ^particular ^following SR grammar 
sub-rules are transferred to- the -speech recogniser -200 as 
-required. A -further alternative is that all*the modif ied 
data structure is transferred -*to the speech recogniser 
2<M> initially, but only some of the SR grammar sub-rules 
are initially aetivated within the speech ^recogniser . 

"In Ahe above described jspecif ic embodiment , timing 
details of the words -recognised by the speech recogniser 
-200 , more particularly a start time and an end time, are 
used in con junction with the time record of the mouse 
click events >-to determine whether the required 
association had occurred, -In another embodiment, instead 
of an absolute -time 4>asis, -the association can be -based 
merely on . the ^required sequential number of second 
modality events occurring^e-g. 

speech input: fax this to him 

mouse input: click click _ 

In other embodiments -the processing operations are 
implemented in systems other *thah the computer 
arrangement described • For example , the speech 

^recogniser can be a completely separate entity from other 
processing units. Similarly, the data structure 
preprocessor 1 may be arranged in a different module to 
the input processor 2+ Indeed, any of the above 



described functions can be implemented in a suitable type 
of processing arrangement, including distributed 
arrangements. 

Xn other embodiments, other modalities other -than 
mouse clicks can form the second modality. Possibilities 
include keyboard input, gestures , -for example via 
suitable video camera inputs, in particular pointing, 
touch screen inputs, and so on. Also, more than one 
modality other than speech can be accommodated. For 
example r in addition to -the first modality of speech , a 
second modalityT^consisting of mouse click events and a 
-third modality consisting of gestures can be included. 
Also, -two different channels of a similar -type of 
moda-li^ty can be accommodated, -for example -aright mouse 
button clicks and left mouse button clicks. 

•The -first modality need not be speech, rather it is 
merely limited to being any input in the *form of a 
natural language which is>*to be recognised using grammar 
-ruleis and -for which *the ^temporal -relationship of ^tokens 
e.g. words is important . For example, sign language as 
used -by deaf people ^ould -form *fche first modality in 
another embodiment of the present invention. 

The above embodiments provide improvements in 
accommodating pauses that arise in a natural language 
input due to the interaction with . further modalities 
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other than that conveying the natural language input* 
fundamentally, -the -further modalities impdse^restrietions 
on the proper flow of *the natural language input and can 
a'f€ect the-^racogn£4:xon 4 «esul4:^i€or-«he natural language 
input modality. V " 

-In its broadest aspects the .present invention is not 
limited -to multimodal inputs. The present invention is 
also applicable *to a single modal input comprising a 
natural language input which is *to be recognised using 
grammar -rules and *f or which the t^poral relationships . 
-between -tokens e.g. words is important. A primary example 
of such an input is speech although another example is 
sign language. 

For such an input, a user may insert ■ pauses 
inadvertently either when emphasising something or due tto 
a particular style of input .e.g. particular style of 
-speech . The present invention is capable of -compensating 
-for this by generating modified grammar -rules in the form 
of a modified data structure for use by the input 
-recognition engine. 

An v .embodiment ^retceiving a single modal input will 
now be described with -reference -to Figures 14 and 15. 
This -embodiment is able to generate modif ied ^speech- 
recognition grammar rules to take into account pauses 
inserted by a user. In a conventional speech recognition 



.engine , when -recognition is carried out on speech in 
which pauses occur other than at the end of a sentence, 
incorrect recognition can result. 

Referring now to Figure; 14, a data structure store 
120 is .provided 4:o store a data structure defining a 
speech ^recognit ion grammar* A pause criteria store 130 
is ^provided -to store criteria -for defining where pauses 
-can occur -in -the grammar -rules. - * '■- 

A data structure preprocessor 110 is ^provided ^for 
-reading SR grammar *ru-lefs from -.the -data structure store 
-140 and^pause criteria f rom the .pause criteria store 130 . 
Using ^the .pause criteria, the data structure can be 
modified—for use by a speech ^recognition engine which is 
provided in an input -processor 140. s For ^speech 
^recognition engines which can accent -tokens defining 
silence, the modified data structure generated by the 
-data structure preprocessor 110 can simply comprise -the 
speech -recognition grammar read from the data structure 
store 120 with -the patfse or silence markers inserted 
therein. Alternatively, for speech ^recognition engines 
which do not ^recognise tokens identifying pauses or 
silence, the data structure preprocessor 110 can carry 
out a further grammar rule modification step of 
fragmenting the grammar rules into speech recognition 
grammar sub-rules as described hereinabove with regard td v 
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-the v embodiment concerned with multimodal grammar rules. 
Thus , the speech recognition engine within -the input 
processor 140 can be provided with speech . -recognition 
grammar sub-rules* -This will enable the speech 
^recognition engine to more accurately carry out 
recognition on input speech. -Thus *the input .processor 
•■440 is able to more accurately generate an input *f or a 
^proceissing unit l"50^4:o'^se@eiv3^he'-^36Mult of -the speech 
-recognition e.g. data and/or commands. 

3?he method of operation of this embodiment to the 
-present invention will now be described with reference to 
-the Clow diagram of Figure 15. 

^rn step 'SSO -the data structure defining grammar 
-rules -for generating an input >to a .processing unit is 
input -fcom *the data structure store 120. *In step S51 a 
modified data structure defining -fragmented or marked 
grammar rules is determined using *the speech pause 
criteria -read f rom the pause criteria store 130. -In step 

552 the modified data structure is used by +the speech 
-recognition engine *to ^recognise input speech and in step 

553 the recognised words are input into the processing 
unit 150 either as data or commands. 

Although the present invention has been described 
hereinabove with reference to specific embodiments, ^the 
present invention is not limited to these embodiments and 
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it. will be apparent to a skilled person in the art that 
modifications can be made without departing from the 
^spirit and scope of -the present invention • 



CLAIMS: 
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1. Data processing apparatus comprising: 

-receiving means -for -receiving a data structure 

defining grammar rules -for recognition of a natural 

language input; 

analysing means *f or analysing the data -structure to 

-identify ..positions in the grammar rules at which pauses 

can occur in -the natural language input; and 

.generating means *f or generating a modified data 

structure defining modified grammar -rulfes^f or recognition 

6^ a natural language input with pauses therein. 

: 2 . Data processing apparatus according to claim l r 
-wherein said analysing means is adapted to identify 4:he 
.positions in accordance with pause criteria -for the 
natural language input. 

3 . Data processing apparatus according -to claim 1 or 
jclaim 2 wherein said generating means is adapted to add 
marker means ^to^the identified positions in the grammar 
-rules at which pauses can occur in trhe natural language 
input to generate the modified data -structure. 

4 . Data processing apparatus according to any preceding 



claim wherein said generating means is adapted to 
fragment the grammar -rules in accordance with said 
identified positions to generate sub grammar rules -to 
^ocm said modified data structure. r ^ - 

5. -Data processing apparatus according to , claim 4, 
wherein said generating means is adapted to form a 
hierarchical structure using Is aid sub grammar ^ules *to 
form said modified data structure. 

*~6 . ^ata^procfessing apparatus according to any preceding 
claim wherein said -receiving means is adapted-to -receive 
a data structure defining grammar -rubles -for use in speech 

^recognition o^ a natural language speech input r r and said: 

-generating means is adapted *to generate said modified 
data structure defining modified grammar rules for speech 

-recognition of ^ a natural language speech input with 
^pauses- therein. -\ - , 

7. ^Data -processing apparatus according to any preceding 
claim wherein said receiving means is adapted to -receive 
said data structure defining grammar rules for 
recognition of a natural language input- as- a first 
modality input in conjunction with associated events in 
at least one further modality input, said data structure 
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defining Jihe association between events in each modality 
input , and events in -said first modality input comprising 
units in the natural language. 

8. Data -process sing apparatus according *to claim 7, 
wherein said analysing means is adapted to identify said 

-positions in the grammar rules based on events in at 
least one said -further modality input. 

9. -Data .processing apparatus according *to claim 7 or 
^claim 8, wherein said generating means is adapted ^to 

generate a further modified data structure defining said 
modified grammar -rules and the -relationships with events 
in-the or each ^further modality input. 

10. A data processing method comprising ^the steps: 
-receiving a data structure defining grammar rules 

for -recognition of a natural language input; 

analysing-the data structure *o identify positions 
in *the grammar *rules at which pauses can occur in the 
natural language input; and 

generating a modified data structure defining 
modified grammar rules for recognition of a natural 
language input with pauses therein . 



52 

11. A data processing method according to claim 10 , 
wherein the analysing step includes identifying the 
positions in accordance with pause criteria -for the 
natural -Language input . 

12. A data processing method according to claim 10 or 
claim 11, wherein -the- generating step includes adding 
marker means -to said data structure to identify the 
positions in ^the grammar rules at which pauses can occur 
in -the natural language input. 

13. A data ^processing method according < to any one of 
claims 10 to 12 wherein the generating step includes 

sf fragmenting ^the grammar cules in accordance with -said 
identified positions .*to generate sub -grammar -rules ^to 

-form said modified data structure. 

14. A data processing method according -to claim 13 
wherein the generating step includes -forming a 
hierarchical- structure using said sub grammar rules to 
-£orm said modified data structure. 

1*5. A data processing method according -to any -one of 
claims 10 to 14 r wherein the receiving step comprises 
receiving a data structure defining grammar rules for use 
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in speech -recognition of a natural language speech input, 
and >the generating step comprises generating a modified 
data structure defining modified grammar rules€or speech 
recognition of a natural language speech input with 
pauses therein. 

16. A data processing method according *to any one of 
claim 10 ^to IS r wherein -the -receiving st^p comprises 
receiving a data structure defining grammar rules for 

-recognition of a - natural language input as a -first 
modality input in conjunction with associated events in 
at least one -further modality input , said data structure 
def ining -the association -between e^vehts in each modality 
input, events in said ^f irst modality input comprising 
units in the natural language. 

17. A data processing method according to claim 16, 
wherein the analysing step comprises identifying said 
^positions in -the grammar -rules based on events in at 

least one said further modality input. 

18. A data .processing method according ^to claim 16 or 
claim 17 , wherein the generating step includes generating 
a further modified data structure defining said modified 
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grammar rules and the relationships with events in the or 
each further modality input. 

19. Apparatus €or generating data in a computer usable 
3Eorm, JbhG apparatus comprising: 

receiving means -for ^receiving a natural language 
input with a number of pauses therein; and 

^recognition means ^for ^recognising -said natural 
language input using -the modified data structure 
generated using-* the method 6f any one of claims 10 to 18 
i^to generate data in conjputer usable form. 

-2*0. Apparatus according -to claim 19/ wherein said 
recognising means comprises speech ^recognition means -for 
-recognising a: natural language speech input. 

-21. A method of generating data in a computer usable 
-form, *the method comprising receiving a natural language 
input with a number of -pauses -therein; and 

-recognising said natural language input using the 
modified data structure generated using the method of any 
one of claims 10 to 18 *to generate data in computer 
usable form. 
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22. A method according to claim 21, wherein the 
-recognising step comprises spe^fch -recognition of a 
natural language speech input. 

23* Apparatus for generating data in a computer usable 
form, ^the apparatus comprising: 

*first modality -receiving data generated ^f or a 
natural language input by *-the apparatus of claim 19 or 
claim -20, said data comprising recognised units of -the 
natural -language and comprising data drf a -first modality 
input; 

^further modality ^receiving means -for -receiving data 
identifying events in at least- one -further modality 
input; 

data structure receiving means *f or receiving a 
further modified data structure defining modified grammar 
rules and -«fche relationships with events in -the or each 
further modality, said -further modified data structure 
having been generated using the method of claim. 18; 

analysing means -for analysing -the first modality 
input data and the or each further modality input data*to 
determine if -they match with any said modified grammar 
rule and -related events in the or each further modality; 
and 
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generating means for generating computer usable data 
in dependence upon said analysis by said analysing means. 

24. apparatus according *o claim 23, wherein said first 
modality -receiving means is adapted to -receive 
-recognition data comprising an ordered list of likely 
natural language units to accompany -the most likely 
natural language --for each natural language unit 
-recognised, and -said analysing means is adapted *to use 
said ordered list when -*the most likely natural language 
units do not result in a match with any modif ied grammar 

-rule and related events in '--the or each -further modality. 

25. Apparatus according *to claim 23 or claim 24 wherein 
said Tirst modality -receiving means is adapted to receive 
speech -recognition data. 

2€ . A method of generating data in a computer usable 
T form f the method comprising: 

a €irst receiving step of ^rxeceiving data generated 
for a natural language input by-the method of -claim -21 or 
claim 22, said data comprising recognised units of the 
natural language, and comprising data of a first modality 
input ; 
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a second -receiving step of receiving data 
identifying -events in at least one further modality 
input; 

a third receiving step of ^receiving a -further 
modified data structure defining modified grammar --rules 
and -the -relationship with events in <the or each -further 
modality , said "further modified data structure having 
been generated during the method of claim 18; 

analysing the first modality input data and -the or 
each further modality input data to determine if they 
match -with any said modified grammar -rule and related 
events in^the or each further modality; and 

generating computer usable data in dependence upon 
said analysis * 

25, A method of according -to claim 2« wherein the first 
receiving step comprises receiving recognition data 
comprising an ordered list of likely natural language 
units -to accompany -the mb'st likely natural language unit 
-for each natural language unit recognised, and the 
analysis step includes using the ordered list when ^the 
most likely natural language units do not -result in a 
match with any modified grammar rule and related -events 
in the or each further modality. 
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-28. A method according to claim 26 or claim 27, wherein 
-the -first ^receiving step ^receives speech -recognition 
data. 

-29. Processor implementable instructions ^f or controlling 
a .processor -to carry out the method of any one of claims 
ID -to 18, 21, 22 or 26 -to 28. 

30. A carrier medium for carrying the -processor 
implementable instructions according to claim 29. 
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ABSTRACT 

NATURAL LANGUAGE INPUT METHOD AND APPARATUS 

-5 A system is disclosed or generating a modified data 

structure defining modified grammar rules -for the 
recognition of a natural language input with pauses in 
which grammar -rules ;f or ^recognition of a natural language 
are analysed ^to identify positions in the grammar -rules 

10 at which: .pauses % can occur in the natural language. A 

modified data structure is generated defining the 
modified grammar nrules in dependence upon the analysis. 
-The modified data structure is used to improve -the 
accuracy of ^recognition of a natural language input with 

15 paunses . Where the natural language input is used in 

conjunction with S second modality input, the analysis is 
performed -to identify the positions of pauses in 
dependence upon the second modality inputs . 
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<Modified SR Fax Rule> = 
<SR sub-rule 1> 
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I. <£SR sub-rule 4> <SR sub-rule 6> <SR sub-rule 3> 
I icSR sub-rule 7> <SR»sub-rule 8> 
I <SR sub-rule 7> <SR sub-rule 9> <SR sub-rule 3> 

~FIG.-8a 
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